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TIME SIMULATION TECHNIQUES TO DETERMINE NETWORK 

AVAILABILITY 

Field oflnvctttioo 

5 The invention is Lu the area of communications network analysis* In 

particular, it is directed to simulation techniques for analyzing the availability or 
unavailability of end-to-end network connections or services. 

Background orluvention 

1 U Capacity plawuug is an Uupoiunt function in dcsianiiifi and provisioning 

cotnmunication networks. While network link and node capacities have been 
estimated for years, there has been relatively little study of availability, especially 
for large mesh networks. Lar^e tnesh networks with multiple nodes and links, and 
with arbitrary topology, are not very amenable to an exact analysis, especially for 

1 5 multiple failures. The multiple failure case means that by the tjme another failure 
occurs, repair processes for at least one previous failure have not completed, so 
that there may be more than one failure to deal with al any unc lime. Simple 
structured point-to-point or ring networks, for example, may have 1+1 or ring 
protection mechanisms for single failures, e.g., a single fiber cut at a time. The 

20 single failure case means that by die time a second failure occurs, repair processes 
for the first failure have completed, so that there is no more than one failure to 
deal with at any one time. In typically route or geographically constrained 
networks of this kind, analytical and approximate techniques can give insight and 
understanding of service availability for each of any possible single failures. If, 

25 however, the network is unstructured like a mesh, if the number of nodes is large, 
and if multiple failures are considered, the calculations, even if approximate, 
quickly become very complicated 

An article entitled "Computational and Design Studies on lire 
Unavailability of Mesh-rertorahle Networks" by Matthieu Cloqueuer and Wayne 

30 D. Grover on Proceedings of DRCN '2000, April 2000, Munich describes 
computational techniques of unavailability of a mesh network for single and 
multiple (mainly two) failures 

As mentioned in the above article, network availability generally refers to 
the availability ofspeufiu pulhs; (also called connections) and not that of a whole 
35 network. Networks as a whole are never entirely up nor entirely down. "Network 
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availability" can be defined as the average availability of all connections in a 
network but this gives less insight and comparative value lhan working with 
individual paths, or perhaps a selection of characteristic reference paths. 
Therefore, service availability between source and sink nodes is more meaningful 

5 to communications users who pay for such services. 

For a quantitative study of network availability, Figure t illustrates service 
on a specific path as down (unavailable) in durations Ul 3 U2, 113, ... Tin along the 
time axis. On the vertical axis (U - unavailability), V indicates the service as 
unavailable, and 'a' as available* Service availability over a period T is the 

10 ftaction of this period during which the service Is up. The availability and 
unavailability of the network (or service) therefore ate defined as follows: 
Availability - lim {(T-EU1)/T> - MTTT/(MTTR+MTTF) 
Unavailability - 1 - Availability - NrrrR/(MTTR+KlTTF) 
Where, M 1 1R is the mean time to recover or repair, and MTTF is the mean time 

1 5 to failure. Recovery is by relatively fast means of network protection (in tens of 
milliseconds) or restoration (pcthaps within a second) capabilities, whereas repair 
is much longer (typically hours). 

The ahnve referenced article discusses computational approaches for 
analyzing availability under a two-failure scenario. Such approaches are quite 

20 complex. 

There is need for faster and easier techniques to determine service 
availability, especially in large mesh networks. Simulation provides tractability 
for large networks, and is also a good check on the accuracy of simple, 
approximate or analytical methods. Thus, the time simulation technique is a 

25 relatively easier and faster process that complements more insightful analytical 
approaches to availability. 

Summary of Invention 

According to the basic concept, the present invention is a time simulation 
30 technique for determining the service availability (or unavailability) of end-to-end 
network connections (oi paths) between source and sink nodes. In accordance 
with one aspect* the invention is directed to a simulation technique to detetmine 
either the network availability or unavailability. 

In accordance with one aspect, the invention is directed to a time 
3 5 simulatiun tnelhod of determining service availability of a communications 
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nctwurk liaving a plurality of nodes and a plurality of links. Hie same principles 
can be applied to mesh networks or to other networks, such as ring networks. The 
method includes steps of: (a) selecting a link between two network nodes; (b) 
performing a simulated link failure on the selected link; (c) selecting a connection 

5 between two network source and sink nodes; and (d) determining the 

unavailability and availability of the connection under the simulated link failure 
condition, Hie method further includes (e) of repeating (c) and (a) and (b); and (f) 
of summing the unavailability and availability of connections after each repetition 
until a predetermined number of connections have been selected* and until a 

10 simulated link failure has been performed on all links; nr until the summed 

tmcivcttJrtbtJjty rind nvnilability has bwm dfltennined to converge, whichever is 
earlier, (A convergence process may be used, for example, if an operator deems 
there to be too many failure scenarios to consider exhaustively, or it is too time 
consuming to consider all failure scenarios exhaustively.) The predetermined 

1 5 number of connections of step (f) may be alt connections or a prcdccidcd subset of 
connections. 

In accotdanoe with a -Further ocpect, the invention is directed to a. time 

simulation apparatus for determining service availability of a mesh or other 
communications network. The apparatus includes a network having a plurality of 

20 nodes and a plurality of links; the links having attributes relating to their failure, 
recovery and repair mechanisms. The apparatus further includes a mechanism fnr 
selecting one of the plurality of links based on the attributes; a failure/repair 
module for performing a simulated failure and repair on the selected link; a 
mechanism for selecting a connection between source and sink nodes; and an 

25 arithmetic mechanism for calculating availability of the selected connection. 

Other aspects and advantages of the invention, as well as the structure and 

operation of various embodiments of the invention, will become apparent to those 

ordinarily skilled in the art upon review of the following description of the 

invention in conjunction with the accompanying drawings. 

30 

Brief Description of Drawings 

Embodiments of the Invention will be described with reference to the 

accompanying drawings, wherein: 
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Figure t is a time-related graph in which periods of unavailable service ate 

shown. 

Figure 2 shows a meshed network with links and nodes, also showing a 
path or connection between source node A and sink node Z. 
5 Figure 3 is a flow diagram of the simulation technique according to otic 

embodiment of the invention. 

Figure 4 shows a simple network for the purpose of illustrating the link 

Figuie 5 is a graph showing an example probability density of link 
] 0 selection. 

Figure 6 is a graph showing an exatnpte cumulative probability of link 
selection generated from Figure 5. 

Figure 7 shows example probability densities of TTF (lime tu failure). 
Figure 8 is a uniform TTF probability density to illustrate details. 
1 5 Figure 9 shows a simple network for the purpose of illustrating the TTF 

aspect of the invention, similar to Figure 4 except that it shows a fiber cut on link 
No. 4. 

Figure 10 is a graph showing an exponential TTF probability density. 
Figure 1 1 is a graph showing a cumulative probability distribution 
20 generated from Figure 10. 

Figure 12 is a schematic block diagram of the simulation technique 
according to one embodiment. * 

Figure 13 is a hypothetical display of expected simulation results after one 
or very few link failures, according to an embodiment of the invention, 
25 Figure 14 is a hypothetical display of expected results after most or all Itntc 

failures, aceording to an embodiment of the invention. 

Detailed Description of Preferred Embodiments of Invention 

Referring to Figure 2, a network has a plurality of nodes and links. The 
30 present invention considers the service availability between specific source and 
sink nodes. The service availability (unavailability) of a connection depends on 
not only the availability of each link in the connection, but also that of all other 
links, because failure of any link may affect the availability of the connection 
under consideration - that is, other foiled links may prevent successful recovery 
35 (protection or restoration) of the connection. 
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In Figure 2, it is assumed that connections arc already provisioned. The 
problem therefore can be stated as follows* 

There are N nodes and L links in the network, each link having Jength dt. 
There are C possible connections between source sink node pairs of type A and Z, 
5 each tonncctiun using lj link*. The connection distance CO is the sum of di's 

over lj links per connection. The total network link distance TD is the sum of dfs 
over L network links. 

The simulation goal is to determine how the link failure process affects the 
connection availability between nodes A and Z. As mentioned earlier, the 
10 availability is defined as: 

Connection unavailability -U = MTTIV(MTTF+MTTR) S 

Connection availability -1-U - MTTF/CMTTF+MTTR), 6 
Where, MTTF is an average Mure rate of F fiber cuts/(1000km*year) and MTTR 
is either MTTRc (recovery time for effective protecrtoa/r^toratton) or MTTRp 
15 (repair time for no> or ineffective, protection/restoration). Recovery indicates 

pro Lee lion or restoration, and rcsluralion implies any reasouabb routing algorilluti, 
e.g., least cost, shortest path, etc.. per operator preference* 

Some examples are as follows: 

* If T - 2 fiber cuts/(l 000km*ycar) and distance D « 5000km. MTTF « 
20 1000/(2*5000) = 0.1 years = 36.5 days. ^ 

• For the same link as above, if 50 ms is needed for eftfcetive protection; 
U - [,05/(3600*2'1]/[36.5+.05/(3600*24)] =< 0.000002%; 

A - i-U 99.999998% - 8 nines. 

• For the same link as above, if 500ms is needed for effective 

25 restoration; U = [5/(3G00*24]/[36.5-h5/(3600*24)] =< 0.00002%; 

A - 1-U ^ 99.99998% - 7 nines, 

♦ For (he same link as above, if 8 hours is needed for repair under no or 
ineffective protection/restoration; 

U * (8/24)/(36.5^8/24) =<r 0.9%; 
30 A = 1-U ^> 99.1% - 2 nines. 

Figure 3 is a flow diagram of Ihc algorithm according to one embodiment 
of the invention. It is assumed that only link failures (F fiber cuts/1 000km pet 
year) occur, since they tend to have a dominant effect on availability. 
Furthermore, only single link failures are considered in Figure 3 - multiple link 
35 failure are considered later* Node failures are not specifically considered here but 
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can be emulated by considering that all links emanating from a node fail 
simultaneously - a particular multiple failure scenario, Referring to Figure 3, the 
simulation algorithm for the network under discussion runs as follows: 

5 ( 1 ) At 30, randomly select a network link 1 to fell ("distance weighted" as 
described later); 

(2) At 32, randomly fail selected network link i based on its TTF density 
disUibution (distance dependent as described later), 

(3) At 34, randomly select time to repair (TTRp) based on its TTRp density (as 
10 described later). (Note that one can also select times to recover (TTRc) based on 

TTRc densities. But these times tend to be quite small and less variable compared 
to repair times, So* here these times are fixed, e.&, at 50 ms fbr protection, or 
e.g., at 500 ms for restoration.) 

(4) At 1(5, sfelerJ.ji ujruiK^iktfi (wmwliun sd«u(iijn r.aube, eg* sequential, 
1 5 random, ot be based on priority* as discussed later); 

(5) At 38, decide if the selected connection is affected or not by the link failure in 

(2) above; 

(6) At 40, if the connection is unaffected* accumulate 0 (unavailable time Ut « 0) 
for this failure on this connection, and proceed with cumulative calculation of 

20 connection V and A (unavailability and availability) at 42 (cumulating will begin 
for subsequent failures); 

(7) At 44, if the connection is affected, invoke the failure recovery scheme at 46 to 
determine whether w not the failure rwovery scheme is effective at 48; 

At 50. if effective, accumulate unavailable time Ut ~ Utrecover for this 
25 affected connection and calculate cumulative connection U and A at 40; 

At 52, if ineffective, accumulate unavailable time Ut = Utrcpair for this 
aITtxted connection and calculate cumulative connection U and A al 40; 
(Note f the failure recovery scheme will be by means of separate processes tor 
either protection or restoration* related to die capacity planning process for 
30 allocating protection ot restoration bandwidth, and for switching links or rerouting . 
connections over this bandwidth to avoid failures.) 

(3) At 54, if not all the connections have been selected, go back tn 36 to repeat far 
all connections (or for any subset, per operator preference), continue to calculate 
Ut (Ut =■ 0, or Utrecovcr, or Utrepair, as applicable) for each connection and 

35 calculate cumulative connection U and A at 42; 
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(9) At 56, determined if aU links (or sufficient links, or specified links, pet 
operator preference) have been selected to fail at least once (or more often, per 
operator preference); 

ifyes,endat58; 

5 if no, determined if U and A converge (e.g., per operator Reference * to 

save simulation time, or if U and A are changing so little, or are already adequate 
enuugh to nul warrant further simulation); 

(10) At 60, if convergence, end at 58: 

if no, go back to 30 to select another link to fail and repeat the procedure 
i 0 to continue for all failure combinations or until convergence. 

Link Selection 

Per operator preference, there may be many ways to select a link to fail, 
e.g., sequentially, randomly, all, or selected subset, fiom experience, etc. 

1 5 However, based on the characteristic of F fiber cuts/( 1 000km*year), a longer link 
is more Hkely to fail, so, as one example, the link distance (di) weighted 
probability is introduced here to select a link to fail. The selection probability 
di/TD (the ratio of link distance di, Lo lolal network liiik distance TD). At 30 hi 
Figure 3. links will be selected acuunJin« lu ihcse probabilities. Ill this way, 

20 longer links get selected with correspondingly higher probability. For example if 
one link lias twice the distance of another, the probability that that link is selected 
is twice that of the other. 

Per operator preference, selection could he with replacement (since the 
same link can fail more than once), or without replacement (e.g.* to speed 

23 simulation time and/or to have more links covered). 

To illustrate selection oflinks to tail, Figure 4 shows a simple network 
with link parameters as follows: 



T-itikNo.i 


Distance di km 


Probability of selection = 
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In the table above, link numbers and their distances are shown together 
with their distance-weighted probability of selection di/TD* Figure 5 id a graph 

showing thp prohahility Hf*ti<*iiy nf link Kpler.Hnn vs tint {1iKlHnr.tr Fiyurp; tfhowst 

5 the cumulative probability distribution of link selection derived from Figure 5. (In 
Figures 5 and 6, the X-axis happens to show link distance ordered from longest to 
shortest, but this ordering is not necessary) A uniform random number generator 
drives the link selection mechanism, that is, the generator generates a random 
number between 0 and 1 shown on the Y axis and selects a corresponding link 

10 shown on the X axis. For example, a random number of 0.7 would select link Nd. 
4, as shown. 

Although this is one way of selecting links to fail, other criteria can be 
considered per operator preference. For example, link infrastructure type (aerial 
versus buried) or location (city versus country) may be more critical .to fiber cuts 
1 5 than just link distance. In such cases, more or less weight is given to such links 
and corresponding alternatives to Figures 3 and 6 can be used. 

TTF (time to fail) selection. 

Like the link selection mechanism discussed above, a random number 

20 generator generates a random number, which selects a TTF from a TTF 
distribution with MTTF. Distributions are preferably based on operator 
experience, but can be any distribution per operator preference. Example TTF 
densities are uniform, normal, exponential, us "shown in Figure 7. Figure 8 
shows a generalized uniform 1TF density to explain some of the parameters in 

25 more detail. For fiber cuts, MTTF = 1000/(F*di), where F is the average number 
of fiber cuts per year and dl is the link Fiber length in km. The uniform density 
ranges Crum "mnT lu "max*, where "axhf ' >— 0 and "max*' = 2MTTF-inin <= 
2M1TF. The density on the Y-axis is determined by i/(max-min) - I/[2tfvttTF- 
min)] >= 1/(2MTTF). 

30 - Another critical aspect of the TTF density is if times to failure can be 

smaller than times to repair (TTRp). For TTF > TTRp, only single failure cases 
will occur (as addressed at present), but If TTF < TTRp, multiple tailures can 
occur and have a relatively higher impact on availability (single and multiple 
failure cases were explained earlier). The granularity of TTF samples is 

35 preferably less than 1/1 0 th of minimum repair time, for reasonably accurate 
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availability assessment during multiple allures (repair time selection is discussed 
later). 

Analogous to link selection discussed earlier, TTF densities are used for 
TTF selection us follows. Figure 9 is the same network as in Figure 4 except thai 

5 it shows a failure in link No. 4. Links are assumed to have an exponential TTF 
density as shown in Figure 1 0. ThiR density would approximately apply, for 
example, if an operator found that failures tended to bunch together in time. TTF 
is selected as follows. Figure 1 1 is the TTF cumulative probability distribution, 
ccwcstpcwctitig tn Figure 10. In Figures 10 and lLMTTFoflinkNo,4isshown 

10 for reference. Link No. 4 has a distance of d4 » 200 km and has aw average off = 
2 fiber cuts pet year. From MTTF - lQOO/^di), this translate to MTTF « 2.5 - 
years which corresponds to exponential probability of 0.63. 

Like the selection mechanism for links, a uniform random number 
generator drives TTF selection* For example, in Figure 1 1, a random number of 

15 0.5 selects TTF - 1.7 years for link No, 4. 

As with link selection above, there may be different TTF distributions for 
different links under different conditions, t he distribution for each link could be 
based on experience in terms of infrastructure type (aerial, buried), type of right- 
of-way (beside railroad* in pipeline), location (city, country), ptoneness to 

70 construction activity or disasters (accidents, floods, earth quakes), etc. 

Once the TTF value is determined, a TTF counter is set to the TTF value 
and is decremented every clock increment. The selected link is considered to fail 
when the counter reaches 0, 

25 TTRp (time to repair) selection 

Like the TTF selection mechanism above, a random number selects a 
value from a TTRp distribution with MTTRp. When 4 link failure occurs, a link 
TTRp counter is started with the selected TTRp value and then is decremented 
every clock increment until the counter reaches 0 at which time flie link is 

30 considered repaired and is returned to the network for service. Note that in 

multiple failure cases, there may be more than one such counter running at any 
one lime. 

As with link and TTF selections above, in generating the TTRp 
distribution, or distribution, per operator preference it is possible to account for 
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numerous effects, e.g.. UcmuKraplucs, hifrastructurc, age of equipment, seasonal 
and time-to-day effects, work force, etc. 

As noted earlier, fixed times to recover (TTRc) are used (since recovery 
times are very small compared to repair times). 
5 It should be noted tbat Figure 3 excludes details of clock increments aud of 

setting, decrementing and tracking TIF and ITRp counters, etc TBD 

Connection selection 

Here, for simplicity, all connections are randomly selected without 
10 replacement. This can be done using a uniform density and- corresponding linear 
distribution of connections, together with a random number generator for 
selection, entirely similar Lu the other selection processes already discussed above. 

However, how connections are selected may effect availability results. 
For instance, under multiple failure condition*, connections selected earlier have a 
1 5 better chance of recovering and of having higher availability than those selected 
later. 

Thus, connections cm be selected according to various criteria per 
operator preference, that is: sequentially, randomly, with/without priority, (e.g., 
mission critical vs best effort traffic), all, or specific subset (c.g.> reference 
20 connections), etc. 

Figure 12 is a block diagram of the simulation apparatus 100 according to 
one embodiment of the invention. 

25 Referring to. Figure 1 2, a database 105 holds network data on the netwnrk 6 

and on individual nodes, links and connections. As shown, the dab identifies 
resources, and includes distances, selection criteria, failure, recovery and repair 
data, etc. A generator 1 1 U generates random numbers by which a selector 120 
selects links (and/or nodes) and their failures, and by which selected connections 

30 are affected or not, according to the stored data concerning link* node and 

connection attributes. The link attributes include* the distance, TTF, TTRc, TTRp> 
etc. Once a link and a connection are selected, a simulation mechanism 115 
performs simulated failure and restoration processes under the control of clock 
increments. The clock 125 generates clock increments, which are calibrated to 

3 5 correspond to a specific real time interval - for instance, one clock increment 
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might be 1/1000 of real time. An arithmetic module 130 calculates the 
availability or unavailability of the selected connection and thereafter the service 
availability of the network. Finally, the availability ie displayed on a display 
module 135. 

5 Figures 13 and 14 ore hypothetical histograms of expected connection 

availability performance after the first few failures and after many failures, 
respectively* These results could be displayed on a display module, Over the 
simulation time, Figure 13 migrates to Figure 14, showing how each connection's 
availability is affected as more failures ore encountered. Figure 14 is an example 

1 0 of what may be a useful way for simulation results to be summarized and 

presented to an operator. For example* the average availability is an indication of 
the overall network availability, and it would also be evident how many atod which 
coiuiections provide high availability (e.g., at least 99,999%), etc. However, 
specific connections and their availability are also identifiable on the X-axis, for 

1 5 example, cutinccliuns known by the operator to carry critical services- Further, it 

could be made possible for the operator to select any such connection and get a 

log of its simulation details, e.g., as lu the route it took, its distance, number of 

hops it went through, which failures affected it, if recovery was successful, etc. 
While the invention has been described according to what ate presently 

20 considered to be the most practical and preferred embodiments, it must be 

understood that the invention is not limited to the disclosed embodiments. Those 
ordinarily skilled in the art will understand that various modifications and 
equivalent structures and functions may be made without departing from the spirit 
and scope of the invention as defined in the claims* Therefore, the invention as 

25 defined in the claims must be accorded the broadest possible interpretation so as 
to encompass all such modifications and equivalent structures and Amotions, 



30 
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